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Abstract 

We consider fully-online construction of indexing data structures for multiple texts. Let T = 
{Ti, ..., Tk} be a collection of texts. By fully-online, we mean that a new character can be appended 
to any text in T at any time. This is a natural generalization of semi-onlme construction of indexing 
data structures for multiple texts in which, after a new character is appended to the fcth text Tk, 
then its previous texts Ti,..., Tk-i will remain static. Our fully-online scenario arises when we index 
multi-sensor data. We propose fully-online algorithms which construct the directed acyclic word graph 
{DAWG) and the generalized suffix tree (GST) for T in 0{Nlog a) time and 0{N) space, where N 
and a denote the total length of texts in T and the alphabet size, respectively. 


1 introduction 

Text indexing is a fundamental problem in computer science, which plays important roles in many ap¬ 
plications including text retrieval, molecular biology, signal processing, and sensor data analysis. In this 
paper, we focus on indexing a collection of multiple texts, so that subsequent pattern matching queries 
can be answered quickly. In particular, we study online indexing for a collection 'T of multiple texts, 
where a new character can be appended to each text at any time. Such fully-online indexing for multi¬ 
ple growing texts has potential applications to continuous processing of data streams, where a number 
of symbolic events or data items are produced from multiple, rapid, time-varying, and unbounded data 
streams Elllo]. For example, motif mining system tries to discover characteristic or interesting collective 
behaviors, such as frequent path or anomalies, from data streams generated by a collection of moving 
objects or sensors [TUlfT^ . 

It is known that suffix trees m and DAWGs [3] can be constructed for a collection of growing texts 
in the semi-online setting, where only the last inserted text can be grown. However, these existing semi- 
online algorithms to maintain a suffix tree or a DAWG for multiple texts are not sufficient to construct 
indexing structures for multiple data streams which grow in a fully-online manner. 

We propose how the DAWG and the suffix tree can be incrementally constructed for a fully-online text 
collection. First, we observe that Blumer et al.’s construction [5] for DAWGs and Weiner’s right-to-left 
construction m for suffix trees can readily be adapted to solve this problem. Hence, at any moment 
during the fully-online growth of the texts, we can find all occ occurrences of a given pattern of length M 
in the current text collection in 0(M logcr -|- occ) time. 

Our next goal is to extend Ukkonen’s construction m to fully-online left-to-right construction of 
suffix trees for multiple texts. A motivation of this goal is that a growing suffix tree can be enhanced with 
powerful semi-dynamic tree data structures such as those for nearest marked ancestor (NMA) queries [14] . 
lowest common ancestor (LGA) queries [7], and level ancestor (LA) queries [T]. Note that these data 
structures cannot be applied to DAWGs, and that the same query results cannot be obtained on the 
suffix tree maintained in a Weiner-like right-to-left manner since the suffix tree obtained in this manner 
inherently indexes the reversed texts in the collection. However, it turns out that this goal is a big 
algorithmic challenge, because: (A) In Ukkonen’s algorithm, a pointer called the active point keeps track 
of the insertion points of suffixes in decreasing order of length. The efficiency of Ukkonen’s algorithm is due 
to the monotonicity of the tracking path of the active point. However, unfortunately this monotonicity does 
not hold in our fully-online construction for multiple texts. (B) Due to the non-monotonicity mentioned 
above, Ukkonen’s technique to amortize the cost to track the suffix insertion points does not work in our 
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Figure 1: Illustration for STrie{T), STree{T), and DAWG{'T) with T = {Ti = aaab,r 2 = ababc,T 3 = 
bab}. The solid arrows and broken arrows represent the edges and the suffix links of each data structure, 
respectively. The number k {k = 1, 2, 3) beside each node indicates that the node represents a suffix of 
Tfc. The nodes [ab] 7 - and [b] 7 - are separated in DAWG{T) since the node bab in STrie{T) is represents a 
suffix of T 3 , while the node abab does not (see also the subtrees rooted at nodes ab and b in STrieiT)). 


case. (C) Ukkonen’s “open edge” technique to maintain the leaves does not work in our case, either. In 
Section [5] we will explain in more details why and how these problems arise in our fully-online setting. 
In this paper, we present a number of new novel techniques to overcome all the difficulties above. As a 
final result, we propose the first optimal 0 (A^log( 7 )-time 0{N)-space fully-online left-to-right construction 
algorithm for a suffix tree of multiple texts over a general ordered alphabet of size cr, where N is the final 
total length of the texts. 

Related work: We note that we can obtain fully-online text index for multiple texts using existing 
more general dynamic text indices as follows. For the index of Ferragina and Grossi [5] which permits 
character-wise updates, first we build a master text $1 • ■ • $if consisting of K delimiters. Then, appending 
a character a to the fcth text in the collection reduces to prepending a to the fcth delimiter $k- Using this 
approach, the index of Ferragina and Grossi [5] takes 0{N log N) total time to be constructed, requires 
0{NlogN) space, and allows pattern matching in 0{M + log N + N log M + occ) time. For the compressed 
index for a dynamic text collection of Chan et al. |^, we can append a new character a to the /cth text Tk 
by removing Tk and then adding TfeO in 0{\Tk\) time. This yields a fully-online index with 0{N'^ logN) 
construction time and 0{N) bits of space (or 0{N/ log N) words of space assuming 0(logAf)-bit machine 
word), supporting pattern matching in 0{M log N + occlog^ N) time. 


2 Preliminaries 

Strings: Let E be a general ordered alphabet. Any element of E* is called a string. For any string T, 
let |T| denote its length. Let e be the empty string, namely, |e| =0. If T = XYZ, then X, Y, and Z are 
called a prefix, a substring, and a suffix of T, respectively. For any 1 < i < j < |r|, let T[i..j] denote the 
substring of T that begins at position i and ends at position j in T. For any 1 < i < |T|, let T[i\ denote 
the ith character of T. For any string T, let Suffix{T) denote the set of suffixes of T, and for any set T of 
strings, let Suffix{T) denote the set of suffixes of all strings in T. Namely, Suffix{T) = Suffix{T). 

For any string T, let T denote the reversed string of T, i.e., T = ^[ITI] • • -T)!]. 

Let T = {Ti ,..., Tk} be a collection of K texts. For any 1 < fc < AT, let Irs-piTk) be the longest suffix 
of Tk that occurs at least twice in T. 

Suffix trees and DAWGs for multiple texts: The suffix trie for a text collection T = {Ti ,..., Tk}, 
denoted STrie{T), is a trie which represents Suffix{T). The size of STrie{T) is 0{N^), where N is the 
total length of texts in T. We identify each node v of STrielfT) with the string that v represents. A 
substring a: of a text in T is said to be branching in T, if there exist two distinct characters a,b G Y such 
that both xa and xb are substrings of some texts in T. Clearly, node x of STrie{T) is branching iff x is 
branching in T. For each node av of STrie(T) with a G Y and v G E*, let slink{av) = v. This auxiliary 
edge slink {av) = v from av to v is called a suffix link. 

The suffix tree [13] for a text collection T, denoted STree{T), is a “compacted trie” which represents 
Suffix{‘T). STree{T) is obtained by compacting every path of STrie{T) which consists of non-branching 
internal nodes (see Fig. |T]). Since every internal node of STree{T) is branching, and since there are at 
most N leaves in STree{T), the numbers of edges and nodes are 0{N). The edge labels of STree{T) 
are non-empty substrings of some text in T. By representing each edge label x with a triple {k,i,j) of 
integers s.t. x = Tk[i..j], STree{T) can be stored with 0{N) space. We say that any branching (resp. 
non-branching) substring of T is an explicit node (resp. implicit node) of STree{T). An implicit node x 
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is represented by a triple {v,a,£), called a reference to x, such that v is an explicit ancestor of x, a is the 
first character of the path from v to x, and £ is the length of the path from v to x. A reference {v, a,£) 
to node x is called canonical if v is the lowest explicit ancestor of x. For each node av of STreelfT) with 
a G S and n G E*, let slink(av) = v. 

The directed acyclic word graph [ail] of a text collection T, denoted DAWGiT)^ is a smallest DAG 
which represents Sujfix{T). DAWG{T) is obtained by merging identical subtrees of STrieiT) connected by 
the suffix links (see Fig.[T|). Hence, the label of every edge of DAWG{T) is a single character. The numbers 
of nodes and edges of DAWGifT) are 0{N) |a, and hence DAWG[T) can be stored with 0{N) space. 
DAWG{T) can be defined formally as follows: For any string x, let Epos^{x) be the set of ending positions 
of X in the texts in T, i.e., EpoS'j-{x) = {{k,j) \ x = Tk[j — |a;| + 1 < j < \Tk\, I < k < K}. Consider 

an equivalence relation = 7 - on substrings x,y of texts in T such that x = 7 - y iff Epos^{x) = Epos^{y). 

For any substring x of texts of T, let [x\-r denote the equivalence class w.r.t. = 7 -. There is a one-to-one 
correspondence between each node v of DAWG{T) and each equivalence class [x]-]-, and hence we will 
identify each node v of DAWG{T) with its corresponding equivalence class [x\-r. Let long{\x\-r) denote 
the longest member of [a;] 7 -. By the definition of equivalence classes, long{[x\'j-) is unique for each [a;] 7 - 
and every member of [x\j- is a suffix of long{[x\-r). If x,xa are substrings of texts in T with a; G E* and 
a G E, then there exists an edge labeled with character a G E from node [a:] 7 - to node [a;a] 7 -. This edge is 
called primary if \long[[x\q-)\ -|- 1 = \long{[xa]'j-)\, and is called secondary otherwise. For each node [a:] 7 - 
of DAWG{T) with \x\ > 1, let slink{[x]-r) = y, where y is the longest suffix of longdx]^-) which does not 
belong to [a:] 7 -. In the example of Fig. (TJ [aaabjT- = {aaab, aab}. The edge labeled with b from node 
[aaa] 7 - to node [aaab] 7 - is primary, while the edge labeled with b from [aa] 7 - to node [aaab] 7 - is secondary. 
slink = [ab] 7 -. 

The following fact follows from the definition of branching substrings: 

Fact 1 . Eor any substring x of texts in T, node x is branching (explicit) in STree{T) iff node [x]-]- is 
branching in DAWG{'T). 

Fully-online text collection: We consider a collection {Ti,..., Tk} of K growing texts, where each 
text Tk {1 < k < K) is initially the empty string e. Given a pair (fc, a) of a text id k and a character a G E 
which we call an update operator, the character a is appended to the A:-th text of the collection. For a se¬ 
quence U of update operators, let U[l..i] denote the sequence of the first i update operators in U with 0 < 
i < \U\. Also, for 0 < i < \U\ let Tjjii.a] denote the collection of texts which have been updated according to 
the first i update operators of U. For instance, consider a text collection of three texts which grow according 
to the following sequence C/ = (1, a), (2, b), (2, a), (3, a), (1, a), (3, c), (3, b), (2, b), (1, a), (1, b), (3, c), (3, b), (1, c), 
(3,b), (2, c) of 15 update operators. Then, 



where the superscript i over each character a in the fc-th text implies that U[i] = (fc,a). For instance, 
[/[15] = ( 2 , c) and hence c was appended to the 2 nd text T 2 = bab in 7 [/[i..i 4 ]j yielding T 2 = babe in 

fu[1..15]- 

If there is no restriction on U like the one in the example above, then U is called fully-online. If there 
is a restriction on U such that once a new character is appended to the fc-th text, then no characters will 
be appended to its previous k — 1 texts, then U is called semi-online. Hence, any semi-online sequence of 
update operators is of form (l,Ti[l]),..., (l,Ti[lTil]),..., {K,Tk[1]), • ■ •, {K,Tk[\Tk\]). 

Section 13] reviews previous algorithms which incrementally construct the DAWG and the suffix tree for 
a growing text collection in the semi-online setting. Sections 01 and |5| propose our new algorithms which 
incrementally construct the DAWG and the suffix tree for a text collection in the fully-online setting, 
respectively. 


3 Semi-online construction algorithms 

Blumer et al.’s semi-online DAWG construction algorithm: We recall Blumer et al.’s algorithm [3] 
which incrementally builds DAWG{Tu) for a given semi-online sequence U of update operators of length 
N. Since U is semi-online, at each step i {0 < i < N) of the semi-online update, there exists a unique 
k {1 < k < K) such that Ti,... ,Tk-i will be static for all the following z'th steps {i < i' < N), Tk is 
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now growing from left to right, and Tk+i, ■ ■ ■, Tk are still the empty strings. Assume that U[i\ = {k, a), 
and hence a new character a is appended to the A:th text in the collection at the ith step. For ease of 
notation, let T' = Tu[i..i-i] and T = Tu[i..i\- Also, assume that DAWG{T') has already been constructed. 
In updating DAWG{T') to DAWG{T), we have to assure that all suffixes of the extended text T^a will 
be represented by DAWG{T). These suffixes are categorized to three different types (see also Fig. [2] in 
Appendix [AT): 

Type-1 The suffixes of T^a that are longer than lrs-y{Tk)a. 

Type-2 The suffixes of T^a that are not longer than lrs'j->(Tk)a and are longer than lrs'r{Tka). 

Type-3 The suffixes of T^a that are not longer than lrs-y{Tka). 

Blumer et al’s algorithm inserts the suffixes of T^a in decreasing order of length, from the Type-1 ones to 
the Type-2 ones. By definition, the Type-3 ones are already represented by DAWG{T'), and hence, we 
need not insert them explicitly. 

Their algorithm maintains an invariant v which indicates node called the active poink from 

which the update starts. There are two cases to happen; 

1. If there is an out-going edge labeled with a from u, then T^a = Irs-piTka), which implies all suffixes 
of TfeO are of Type-3. There are two subcases: 

(a) If the edge labeled with a is primary, then no updates to the graph topology are needed. The 
new active point for the next step is on [lrs'i-{Tka)]'j-. 

(b) If the edge labeled with a is secondary, then the graph topology needs to be updated (see Fig. [3] 
in Appendix Since the edge is secondary, every member Xa of u = [Irs-piTka)]']-' that is 
longer than T^a is not a suffix of T^a, while every member Ya oi u = [lrs'j-{Tka)]'i-' that is not 
longer than Tka is a Type-3 suffix of TfcO. This implies that EpoS'j-{lrs']-{Tka)) D EpoS'j-{Xa). 
By the definition of the nodes of DAWGs (recall Subsection [5]), the node u is split into two 
nodes z = {Xa]j- and w = [IrspiTka)]']-. First, a new node w is created. All secondary in¬ 
coming edges of u corresponding to Type-3 suffixes Ya are redirected to w. This can be done 
by traversing the chain of the suffix links starting from v. All the out-going edges of u are 
copied to w. Now, node w is complete, and the node u with its remaining in-coming edges is 
the other new node z. The suffix link of u is inherited by w, and the suffix link of z is set to w. 
The new active point for the next step is on node w. 

2. If there is no out-going edge labeled with a from the active point v, then a new sink s is created. The 

Type-1 suffixes are inserted by making a new edge labeled by a from v = \Tk\T' to s. To insert the 
Type-2 suffixes, the active point v moves by updating v ■<— slink{v). Then the following procedure 
is repeated until an out-going edge labeled with a from the active point is found: (i) A new edge 
labeled with a from u to s is created, (ii) The active point v moves by updating v <r- slink(v). The 
node u where the above procedure ends is [lrs'r{Tka)\'j-i ^ and the new sink s is exactly which 

represent all Type-1 and Type-2 suffixes of Tka. There are two cases: 

(a) If the edge labeled with a from the last locus v of the active point to u is primary, then 
u = [lrs-r{Tka)]-r. Thus no updates to the graph topology are needed. The suffix link of the 
new sink s = [Tkolj- is set to u. 

(b) If the edge labeled with a from the last locus v of the active point to u is secondary, then as in 
Case llbl u is split into two nodes w and z where w represents the members of u that are longer 
than the longest repeating suffix IrspiTka) (none of these members is a suffix of Tka), and z 
represents the members of u which are Type-3 suffixes of Tka. The suffix link of the new sink 
s is set to z. 

In both subcases above, the new active point is on the new sink s = [Tka]-]-. 

It is not difficult to see that if the total number of new nodes, edges, and suffix links is q, then the 
above update takes 0(g log cr) time, where the log cr term is due to searching for an out-going edge labeled 
by a. Since no existing nodes, edges, or suffix links are deleted during the updates, and since the size of 
DAWG{Tu) is 0{N), the amortized time for the update is 0(log(T). Hence, DAWG{Tu) can be constructed 
in 0{N log a) time and 0{N) space in the semi-online setting. 

Ukkonen’s semi-online suffix tree construction algorithm: Ukkonen m proposed an algo¬ 
rithm to incrementally construct the suffix tree of a single text. His algorithm can easily be extended to 
incrementally construct the suffix tree for multiple texts in the semi-online setting. 
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Let U he a, semi-online sequence of N update operators such that the last update operator for each 
k {1 < k < K) is (/c,$fc), where $k is a special end-marker for the fcth text in the collection. For ease of 
notation, T' = Tu[i..i-i\ and T = Tu[i..i], Also, assume that we have already constructed STree(T') and 
that the next update operator is U[i] = (fc, a). Thus a new character a is appended to the fcth text Tk of 
T^ and the fcth text of T becomes T^a. 

As in the case of semi-online DAWG construction, the suffixes of T^a are inserted in decreasing order 
of length. The Type-1 suffixes are maintained as follows. Let s be any suffix of Tk which is represented by 
a leaf of STree{T'). Since s is a non-repeating suffix of Tk in T', sa is a non-repeating suffix of T^a in T, 
which implies that sa will also be a leaf of STree{T). Based on this observation, the label of the in-coming 
edge of s is represented by a triple {k, &, oo) called an open edge, where b is the beginning position of 
the label of the in-coming edge in the fcth text. This way, every existing leaf will then be automatically 
extended. Hence, updating STree{T') to STree{T) reduces to inserting the Type-2 suffixes of T^a. For 
this sake, the algorithm maintains an invariant which indicates the locus of a; = Irsp-iiTk) on STree{T') 
called the active point. Since x can be an implicit node, the algorithm maintains the canonical reference 
(v,c,£) to X. For convenience, if x is an explicit node, then let its canonical reference be (a:,e,0). The 
update starts from the current active point x represented by its canonical reference pair, and the Type-2 
suffixes of Tka are inserted in decreasing order of length, by using the chain of (virtual) suffix links. There 
are two cases: 

1. If it is possible to go down from x with character a, then no updates to the tree topology are needed. 
The new active point is xa, and the reference to xa is made canonical if necessary. The update ends. 

11. If it is impossible to go down from x with character a, then we create a new leaf. Let j be the 
beginning position of the suffix of Tka which corresponds to this new leaf. The following procedure 
is repeated until Case I happens. 

(a) If the active point x is on an explicit node, then a new leaf node s is created as a new child of 
X, with its incoming edge labeled by {k, b, oo), where b = \Tka\ — \x\ -\- 1. The active point x is 
updated to slink(x). 

(b) If the active point x is on an implicit node, then x becomes explicit in this step. A new leaf 
node s is created as a new child of x with its incoming edge labeled by {k,b,oo). Since the 
suffix link of the new explicit node x does not yet exist, we simulate the suffix link traversal as 
follows (see also Fig. 0] in Appendix |A|. Let {vj,Cj,£j) be the canonical reference to x. First, 
we follow the suffix link slink(vj) of vj, and then go down along the path of length £j from 
slink(vj) starting with character Cj. Let this locus be xh Let Vj+i be the longest explicit node 
in this path, (i) If |nj+i| = |a;'|, then we firstly create the new suffix link slink{x) = Vj+i for 
the new explicit node x. The active point x is updated to x' and is represented by canonical 
reference (uj+i,e,0). (ii) If |nj+i| < \x'\, then the next active point is implicit. The active point 
X is updated to x' and is represented by canonical reference (uj+i, Cj+i, £j+i). The suffix link 
of X will be set to x' when x' becomes explicit in the next step. 

The most expensive case is Il-b-(ii). Since the path from Uj+i to x' contains at most £j — £j+i explicit 
nodes, it takes 0{{£j — £^+1 -I- 1) logcr) time to locate the next active point x' (note £j — £j+i > 0 holds). 
All the other operations take OiXoga) time. Hence, the total cost to insert all leaves (suffixes) for the kth 
text is 0(^^^(£j — £j+i + 1) logcr) = 0(A''fc logcr), where Nk is the final length of the kth text. Thus the 
amortized time cost for each leaf (suffix) for the fcth text is 0(log cr). Overall, it takes a total of 0{N log cr) 
time to construct STree(Tu) for a semi-online sequence U of update operators. The space requirement is 
0{N). 

4 Fully-online DAWG construction algorithm 

We can easily extend Blumer et al.’s semi-online DAWG construction algorithm to the fully-online setting. 
Let U he a fully-online sequence of N update operators. Our fully-online algorithm maintains the active 
point Vk for every growing text Tk in the collection, at any step of the algorithm. Now, assume that 
we have already constructed DAWG{T'), where T' = Tu[i..i-i] for 1 < i < TV. Let U[i] = (fc, a), and 
we are updating DAWG{T') to DAWG{T^, where T = Tu[i..i]- The update starts from the active point 
Vk = [TkW'j exactly in the same way as was described in Section[31 The total cost to update DAWG{T') 
to DAWG{T) is again 0(glogcr), where q is the total number of nodes, edges, and suffix links which were 
introduced in this update. Since the total size of DAWG{T^ is 0{N), the amortized cost for this update 
is again O(logcr). By the above arguments, we obtain the following theorem. 
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Theorem 1. Given a fully-online sequence U of Nupdate operators for a collection of K texts, we can 
update DAWG{Tu[i..i\) for i = 1,..., N in a total of 0{N \oga) time and 0{N) space. 

A snapshot of fully-online DAWG construction is shown in Fig. [S] of Appendix [Al 
Assume for now that each text Tk in a collection T begins with a special character which does not 
appear elsewhere in T. Then, the tree of the (reversed) suffix links of DAWG{1') forms the suffix tree 
STree{T) for the collection T = {Ti,... ,Tk} of the reversed texts of T [3]. Hence, the next corollary 
follows from Theorem [TJ which gives right-to-left fully-online suffix tree construction. 

Corollary 2. Given a fully-online sequence U of Nupdate operators for a collection of K texts, we can 
update STree{Tu[i..i]) fori = , N in a total of 0{N loga) time and 0{N) space. 


5 Fully-online suffix tree construction algorithm 

Difficulties in fully-online construction of suffix trees: Unlike the case with DAWGs, it is not easy 
to extend Ukkonen’s semi-online suffix tree construction algorithm to our left-to-right fully-online setting, 
because: 

A. Let U[i] = {k,a) which updates the current fcth text Tk to T^a, and assume that we have just 
constructed STree{Tu[i..i])- Recall that we defined the initial locus of the active point for T^a on 
STree{Tu[i..i]) to be the longest repeating suffix of T^a in 7u[i..i]. However, since U is fully-online, 
any other text Th {h k) in the collection would be updated by following update operators U[r\ 
with r > i. Then, the longest repeating suffix of T^a in 7u[i..r] can be much longer than that of TkO 
in Tu[i..i\- In other words, some Type-1 suffixes of T^a in Tu[i..i\ can become of Type-2 in Tu[i..r\ 
(see Fig.[n]in Appendix lAl for a concrete example). What is worse, updating Th can affect the longest 
repeating suffix of any other text in the collection as well. If we maintain all these active points 
naively, it takes 0{KN\oga) time. 

B. Even if we somehow manage to efficiently maintain the active point for each text in the collection, 
there remains another difficulty. Let j be the beginning position of the longest repeating suffix of 
TkO in 7[7[i..i], and let {vj,Cj,£j) be the canonical reference to this suffix. Let U[i'] = (k,a') be the 
first update operator in U which updates the fcth text after U[i] = {k,a). Let (u',c',£') be the 
canonical reference to the longest repeating suffix of T^a in Tu[i..i'], which is the “real” initial active 
point where insertion of the Type-2 suffixes should start at this i'th step. By the property of suffix 
trees > £j holds, and what is worse, this length £' is unbounded by the number of Type-2 suffixes 
inserted at this i'th step. Thus, the amortization technique we used for the semi-online construction 
does not work in the fully-online setting. 

C. The phenomenon mentioned in Difficulty A also causes a problem of how to represent the labels of 
the in-coming edges to the leaves. Assume that we created a new leaf w.r.t. an update operator 
(fc, a), and let (fc, bk, oo) be the triple representing the label of the in-coming edge to the leaf, where 
bk is the beginning position of the edge label in the /cth text. It corresponds to a Type-1 suffix of the 
fcth text, but the leaf can later be extended by another growing text Th. Then, the triple (fc, bk, oo) 
has to be updated to {h, bh, oo), where bh is the beginning position of the edge label in the hih text 
(see also Fig. [5] in Appendix lAl) . Notice that this update may happen repeatedly. 

Constructing suffix trees with the aid of DAWGs: We utilize DAWGs to overcome Difficulties 
A, B and C in fully-online construction of suffix trees. Namely, we construct STreeifT) in tandem with 
DAWGiT). 

A high-level description of our algorithm is as follows. We insert the Type-2 suffixes of T^a in increasing 
order of length, starting from the locus of the longest Type-3 suffix of TkO. The idea of inserting the Type-2 
suffixes in increasing order of length was also used by Breslauer and Italiano , for quasi real-time left- 
to-right construction of the suffix tree for a single text. To efficiently find the locus where the next longer 
Type-2 suffix should be inserted in the tree from the locus where the last Type-2 suffix was inserted, we 
introduce a simpler amortized variant of the suffix tree oracle of Fischer and Gawrychowski [^. These 
will overcome Difficulties A and B. To overcome Difficulty G, we introduce new lazy representation of the 
labels of edges leading to the leaves. 

Lemma 1. We can compute, in amortized 0(log cr) time, a canonical reference to the longest Type-3 suffix 
lrs-j-{Tka) ofTkO on STree{T'), using a data structure which requires space linear in the total length of the 
texts in T. 
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Proof. We introduce the longest path tree of T', denoted LPT{T'), which is the spanning tree of DAWG{T') 
consisting only of the primary edges of DAWG{T'). Every node of LPT{T') is marked iff its corresponding 
node on DAWG{T') is branching. Every marked node of LPT{T) is linked to its corresponding node of 
STree{T') which is also branching by Fact [T] (see Fig. [7]in Appendix fSTl. LPT{T') is enhanced with the 
nearest marked ancestor (NMA) data structure of Westbrook |14j , which supports the following operations 
in amortized 0(1) time using linear space: 1) find the NMA of any node; 2) insert an unmarked node; 3) 
mark an unmarked node. 

When DAWG{T') is updated to DAWG{T), at most two new primary edges are introduced to 
DAWG{T), one for the new sink and one for the split node. We insert these new edges to LPT{T') 
and obtain LPT{T). Because of these new edges, at most two non-branching nodes of DAWG{'T') can 
become branching in DAWG{7'). We mark their corresponding nodes in LPT{T), and link them to the 
corresponds suffix tree nodes after we have constructed STree{T). This is because the corresponding nodes 
of STree{T') are still non-branching. 

We use LPT{T) to quickly move from the DAWG to the suffix tree. Since lrs-r(Tka) is the longest in 
[lrs-r{Tka)\r, there always exists a node y of LPT{T) which represents lrs-r{Tka). We conduct an NMA 
query from y on LPT{T), and let v be the NMA of y. Let i =\y\ — |n|, and let c be the label of the first 
edge in the path from v to y. We move from v to its corresponding node x in STree{T'). Then, (x, c, £) is 
a reference to lrsr{Tka) in STree{T'). Since v is the NMA of y in LPT{T), and since updating Tk to T}~a 
does not explicitly insert any suffix shorter than lrs-j-{Tka), this reference is canonical by Fact[TJ 

Clearly the total size of the above data structures is linear in the total length of the texts in T. 
We analyze the time complexity. Recall Case [2] when updating DAWG{T') to DAWG{T). At the end 
of the update, we find (or create) in amortized O(logcr) time the node of DAWG{T) which represents 
[lrs'j-iTka)]'f. Hence we can find node y = Irsp-^Tua) in amortized O(logcr) time. Updating LPT(T') 
to LPT{T) takes O(logcr) time. Inserting a new node and querying an NMA from a given node takes 
amortized 0(1) time. We can link a new marked node of LPT{1~) to the corresponding new branching 
node of STree{T) in 0(1) time, since we can remember this new branching node when updating STree{T') 
to STree{T). Hence, the amortized bound is O(logCT). □ 

To find the insertion point of the shortest Type-2 suffix from the longest Type-3 suffix lrsq-{Tka), and 
to insert the Type-2 suffixes of T^a in increasing order of length, we maintain the labeled reversed suffix 
links for each explicit node of the suffix tree. Namely, if slink{bv) = v for two nodes bv,v with v G T,* 
and 5 S S, let rslinki,{v) = bv. We leave rslinkb{v) undefined if bv is not a substring of any text in the 
collection, or node bv is implicit in the suffix tree. 

A suffix tree oracle for a suffix tree S' is a data structure which efficiently answers the following query: 
given a pair {v,b) of a node of S and a character & € E, return the nearest ancestor u oi v for which 
rslinkb{u) is defined. The state-of-the-art suffix tree oracle by Fischer and Cawrychowski [3] answers 
queries and supports updates in worst-case 0(loglogn-|- (log log cr)^/log log log cr) time each, using 0{n) 
space, where n is the number of leaves in S. The next lemma shows our simpler suffix tree oracle with 
amortized 0(log a) bound. 

Lemma 2. For a suffix tree with n leaves, there is a suffix tree oracle of size 0(n) which answers each 
query in amortized O(logcr) time. It takes amortized 0(log(T) time to update this suffix tree oracle, per 
insertion of a new leaf or a new suffix link to the suffix tree. 

Proof. (Sketch) We follow the approach by Fischer and Cawrychowski [^. The log log n term in the 
running time of their suffix tree oracle is due to the fringe nearest marked ancestor data structure by 
Breslauer and Italiano [5], which answers each NMA query in a special case in worst case O (log log n) 
time. It is possible to replace the fringe nearest marked ancestor data structures with the NMA data 
structures of Westbrook so the time cost for each NMA query is amortized to 0(1). The other 
(log log cr)^/log log log O' term is due to fast predecessor data structures for integer alphabets. Since our 
alphabet is more general, we use balanced search trees with 0(logCT)-time operations. Hence our bound 
is 0(log(T) amortized. A complete proof is shown in Appendix [BJ □ 

To overcome Difficulty C, we employ lazy maintenance for leaves, namely, we maintain only the first 
character of the label of every edge leading to a leaf. On the other hand, we eagerly maintain the whole 
label of every edge leading to an internal explicit node. The next lemma holds. 

Lemma 3. The lazy representation of the in-coming edges of leaves allows for updating the suffix tree in 
amortized O(logcr) time per insertion of a new leaf. 
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Proof. Let U[i] = (fc, a) and T = Tu[i..i\ as previously. Let xa be a Type-2 suffix of the extended text T^a 
to be inserted to the suffix tree. Using the suffix tree oracle of Lemma [U we obtain a canonical reference 
(v, c, £) to X from which a leaf for the suffix xa is to be inserted. 

The difficult case is when x is on the edge e from n to a leaf and £ > 2, since we only know the first 
character c of the label of e. We create a new internal node x on e, and create a new leaf as a child of x and 
its in-coming edge labeled with the first character a. We can determine the label of the in-coming edge of 
the new internal explicit node x as follows. Let y be the node of LPT{T) which corresponds to the node 
[v\r of DAWG{T), namely y = long{[v\r). We represent the label of each edge of LPT{T) by a pair of 
the text id and the position of the character in the text of that id. Let {h^j) be the label of the out-going 
edge of node y of LPT[T) such that Th[j] = c. Since we insert the Type-2 suffixes of T^a in increasing 
order of length, the path in LPT{7') of length i starting with this edge from y is non-branching. Thus, 
we can label the in-coming edge of the suffix tree by triple {h,j,j — See also Fig.|5]in Appendix El 

While updating DAWG{T') to DAWG{T), we have visited the node [x\r. We can obtain node y on 
LPT{T) by an NMA query from node long{[x\'f), and associate to y each Type-2 suffix xa of T^a whose 
length is in range [s -I- 1, ^ -I- 1], where s and I are the lengths of the shortest and longest members of 
{x]^-, respectively. As we insert the Type-2 suffixes of T^a to the suffix tree in increasing order of length, 
for each Type-2 suffix xa we can access to its corresponding node y in amortized O(logCT) time. It takes 
amortized 0{\oga) time to query the suffix tree oracle by Lemma [H All the other operations take 0(1) 
time each. □ 

Assume we are searching a growing text collection T for a given pattern P. If we stuck on the parent 
node M of a leaf in STree{T) due to our lazy leaf representation, then we can move to the DAWG node 
which corresponds to the parent node u via LPT{T), and continue searching for P on DAWG(T). This 
way we can find the locus of P on STree{T) in optimal 0(Mlogtr) time, where M = |P|. Also, since the 
tree topology is correctly maintained with our lazy leaf representation, semi-dynamic NMA m, lca h, 
and LA [T] queries can be correctly supported in 0(1) time on our suffix tree representation. 

Theorem 3. Given a fully-online seguence U of Nupdate operators for a collection of K texts, we can 
update STree(7if[i..i]) fori = 1,..., N in a total of 0{N loga) time and 0{N) space. 

A snapshot of fully-online suffix tree construction is shown in Fig. [5] of Appendix After the whole 
U has been processed, we determine the triples representing the entire labels of the in-coming edges of all 
leaves of STree{Tu) in a total of 0{N) time. We can then discard DAWG(Tu) and LPT{Tu). 
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A Appendix (Figures) 

In this appendix, we show some supplemental figures which support understanding of the contents in the 
main body of this paper. 


Tk[ 


Irsr' (W 



lrsr(Tka) r \a\ 

0 


Type-1 


Type-2 


Type-3 


Figure 2: Illustration for the Type-1, Type-2, and Type-3 suffixes of TkU. 



Figure 3: Illustration for node split in updating DAWG{T') to DAWG{T). The edge labeled with a from 
u to u = [lrs-r'{Tka)]-r' is secondary, and hence u in DAWG{T') is split into two nodes z = [Xajr and 
w = [lrs'r{Tka)]r in DAWG{T). The out-going edges of u are copied for w. The suffix links that point to u 
in DAWG{T') point to z in DAWG{T), and the suffix link from u in DAWG{T') is from w in DAWGiX). 
The suffix link from z is set to w. The time cost required for this node split is linear in the number of new 
nodes, edges, and suffix links. 
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Figure 4: Illustration for Case Il-b-(ii), where the suffix tree is being updated by an update operator {k, a). 
To the left is a part of the suffix tree just before the leaf corresponding to the jth suffix of T^a is going 
to be inserted from the active point x. Since there is yet no suffix link from the locus for x, we move to 
the next active point x' via slink(vj), going down along the corresponding path from slink{vj) to Vj+i. To 
the right is a part of the suffix tree after the leaves corresponding to the jth and {j + l)th suffixes of Tka 
have been inserted. The amount of work here is 0{{£j — £j+i + 1) logcr). 


DAWG 





^: active point of Ti 

Figure 5: A snapshot of fully-online DAWG construction, where we update DAWG{T') to DAWG{T) 
with T' — {Ti — abab,T 2 = aaab} and T = {Tib,T 2 }. We insert the suffixes of Tib as follows. Type-1 
suffixes ababb and babb are inserted by a new edge labeled b from the active point to the new sink. The 
active point moves via the suffix link, and Type-2 suffixes abb and bb are inserted by another new edge b 
from the active point to the new sink. The active point moves via the suffix link again, and the longest 
Type-3 suffix b is found. Since the edge from the source to the node is secondary, the node is separated 
into two nodes. 
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Figure 6 : Consider a text collection T with two texts which grow according to fully-online sequence 
U = (1, a), (1, b), (2, b), (1, a), (2, a), (1, b), (1, c), (2, b), (2, c), (2, d) of 10 update operators. To the left is 
S'T'ree(7c/[i..8]), where the active point for Ti = ababc is on the root and that for T 2 = bab is on the 
implicit node bab. The numbers 1 and 2 shown on STree{'Tu[i,,%\) indicate the loci of the suffixes of Ti 
and T 2 , respectively. In STree{Tu[i..s\), the labels of the in-coming edges to the leaves corresponding to 
babe, abc, and be are represented by triples (1,3, 00 ) (1,5, 00 ), and (1,5, 00 ), respectively. To the right 
is iS'Tree(7[/[i..io])! where the 2nd text T 2 has been updated from bab to babed. Due to this update to 
r 2 , the locus of the active point of Ti — ababc has been changed to the implicit node babe (Difficulty 
A). Moreover, due to this update to T 2 , the leaves representing babe, abc, and be have been respectively 
extended to representing babed, abed, and bed. Hence, the triples for the labels of their in-coming edges 
have to be updated to (2, 2, 00 ) (2,4, 00 ), and (2,4, 00 ), respectively (Difficulty C). 


DAWG LPT Suffix Tree 



Figure 7: Illustration for DAWG{T), LPT{T), and STree{T'), where T' = {Ti = aaab, T 2 = ababc,T 3 = 
ab} and T = {Tic,T 2 ,T 3 }. The bold solid arrows represent the primary edges of DAWG{T), the gray 
nodes are the marked nodes of LPT{'T), and the dashed arrows represent the links between the marked 
nodes of LPT[P) and the corresponding branching nodes of STree{T'). lrs'j-{Tic) = abc, and hence we 
perform an NMA query from node abc on LPT{T), obtaining node ab. We then access the suffix tree 
node ab using the pointer from LPT{P), and obtain the locus of abc on STree{P'). 
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Figure 8: Illustration of how to determine the label of the in-coming edge of a new internal explicit node 
which is created on an edge leading to an existing leaf. Let T' = {Ti = abab, T 2 = aaab, Ta = ababc}, 
and T = {rid,r 2 ,T 3 }. Now we are inserting a new leaf w.r.t. Type-2 suffix babd of Tid. The canonical 
reference to the insertion point of this suffix is (b, a, 2), and hence we create a new internal node on the 
middle of the out-going edge of node b whose edge label begins with a. Now, since long{\b]q-) — ab, we 
access the LPT node y — ab. Since the label a of the out-going edge of y in LPT{T) is now represented by 
pair (3, 3), we can label the new suffix tree edge leading to the new internal node by (3, 3, 3-1-2—1) = (3, 3,4). 
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insert 

Ti [1. .5]^ababb 
Ti[2..5]=babb 


insert 
'i [3 . . 5]-abb 


: active point of Ti 

Figure 9: A snapshot of fully-online suffix tree construction, where we update STreeiT') to STreeiT) with 
T' = {Ti = abab, T 2 = aaab} and T = {Tib,T 2 }. Recall that we employ lazy maintenance of the leaves, 
and hence each character within a box is only imaginary and is not computed during the updates. Due 
to lazy representation of leaves, we do nothing to insert the Type-1 suffixes of Tib. To start inserting 
the Type-2 suffixes in increasing order of length, we first find the longest Type-3 suffix b via LPT{T) 
using Lemma [T] We insert the shortest Type-2 suffix bb. Using Lemma [3] and Remark [I] in Appendix [Bl 
we find the edge whose label begins with b from the root, and create a new internal node in the middle 
of this edge. After creating a new leaf from the new internal node and its in-coming edge with the first 
character label b, we determine the label of the in-coming edge of the new internal node using Lemma [21 
The reversed suffix link is set from the root to this new internal node b. The next Type-2 suffix is abb, 
and hence we query (u, a) to our suffix oracle of Lemma [21 where v is the node representing b, and obtain 
node a. We find the edge whose label begins with b from this node, and create a new internal node in 
the middle of this edge. After creating a new leaf from the new internal node and its in-coming edge with 
the first character label b, we determine the label of the in-coming edge of the new internal node using 
Lemma 121 The reversed suffix link is set from node b to this new internal node ab. Since we have inserted 
all the Type-2 suffixes, the update finishes. 






14 

















B Appendix (Proof of Lemma [2]) 

In this appendix, we show a complete proof of Lemma [2j 
We use the following known result in our proof: 

Lemma 4 (Lowest common ancestor (LCA) on semi-dynamic tree [7]). A semi-dynamic rooted tree can 
be maintained in linear space in its size so that the following operations are supported in worst-case 0(1) 
time: 1) find the lowest common ancestor (LCA) of any two nodes; 2) insert a new node. 

We are ready to show Lemma [H 

Proof. The design of our suffix tree oracle follows the data structure by Fischer and Gawrychowski [5], but 
ours is much simpler since an amortized 0(log (T)-bound is enough for our goal. We define the weight of 
each node v of the suffix tree, denoted w{v), to be the sum of the number of leaves in the subtree rooted 
at V and the number of reversed suffix links defined in the subtree. A node v is called heavy if w{v) > 2a, 
and is called light if w{v) < cr. A node v with a < w{v) < 2a can be either light or heavy. Clearly, if a 
node is heavy, then its all ancestors are heavy. A heavy node v is called a heavy leaf if no children of v 
are heavy, and it is called a heavy branching node if at least two children of v are heavy. See also Fig. (TUI 



Figure 10: Illustration for heavy nodes, light trees, and induced heavy tree on a suffix tree. The circles 
represent heavy nodes, while the white triangles represent light trees. The gray nodes are heavy leaves, 
and the black nodes are branching heavy nodes. The induced heavy tree is a tree consisting only of these 
black and gray nodes. 

First, we show a suffix tree oracle for heavy nodes. We maintain a tree called the induced heavy tree 
over the suffix tree which consists only of the heavy leaves and the heavy branching nodes. Since there 
are only Oinja) heavy leaves, the total size of the induced heavy tree is Oinja). From each heavy node 
of the suffix tree, we maintain a pointer to its corresponding edge in the induced heavy tree. For each 
edge e of the induced heavy tree, if there is a suffix tree node v associated to e with rslinkhfv) defined for 
character b G then we maintain an invariant loweste(b) which indicates the lowest node associated to e 
for which rslinkb(v) is defined. In each edge, we maintain these invariants for all characters by a balanced 
binary search tree. Since the size of a balanced binary search tree is 0{a), the total space for all edges of 
the induced heavy tree is 0{a xn/a) = 0{n). We also maintain a NMA data structures of Westbrook [14] 
over the induced heavy tree: A node u in the induced heavy tree is marked in the NMA data structure 
for character & € E, iff lowestgib) is defined where e is the in-coming edge to u. Note that the total size 
for all a NMA data structures is 0{a x n/a) = 0(n) as well. Given a query {v, b) to the suffix tree oracle 
where r: is a heavy node of the suffix tree, then we first access the edge e of the induced heavy tree with 
which v is associated. There are three cases: 

(1) If loweste(b) is an ancestor of v, then loweste(b) is the answer. 

(2) If lowestf.{b) is a descendant of v, then v is the answer. 

(3) If loweste{b) is not defined, then we take the branching node u of the induced heavy tree of which e 
is an out-going edge. We perform an NMA query from u using the NMA data structure associated 
with 6, and then this case reduces to either case (1) or case (2). 

This suffix tree oracle answers a query in amortized O(logCT) time, since we need amortized 0(1) time for 
each NMA query, and 0{\oga) time to search for lowestb{e) in the balanced search tree and to access the 
NMA data structure for character b. 
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Second, we show a suffix tree oracle for light nodes. Each maximal subtree consisting only of light 
nodes is called a light tree. Clearly, the total number of nodes and reversed suffix links defined in each light 
tree is at most 2(T — 1. For each light tree, we maintain a simple suffix tree oracle proposed by Fischer and 
Gawrychowski which answers queries in 0(log a) time: Consider any light tree LT. For each character b 
we maintain a preorder traversal of LT which contains all and only the nodes x in LT such that rslinkb(x) 
is defined. Then, for any query {v, b), u is the nearest marked ancestor of v with rslinkb{u) defined, iff v is 
the predecessor of u in the preorder traversal for character b. Since comparing two elements there reduces 
to computing their LCA, we can use the dynamic LCA data structure of Lemma 0] Thus, by maintaining 
a balanced search tree which stores the preorder traversal of LT for each character &, we can compute the 
predecessor in O(logCT) time. The total size of the balanced search trees for all characters is linear in the 
number of reversed suffix links defined in LT, which is 0{a). 



Figure 11: As soon as the weight of the root r of a light tree LT reaches 2tT, we take a maximal path 
si,... ,Sh of heavy nodes of weight at least cr from r. All these nodes si,..., are then associated to an 
edge of the induced heavy tree. Since the weight of r is 2tT, all the new light trees are of weight at most a. 

What remains is how to update the suffix tree oracle when a new leaf or a new reversed suffix link is 
inserted to the suffix tree. Assume that a new leaf or a new reversed suffix link is inserted to a light tree 
LT of size 2cr — 1. The weight w{r) of the root r of LT is now 2cr, meaning that the root r becomes a 
heavy node. We take a maximal path S\,... ,Sh starting from the root r = si such that w{si) > a for all 
1 < i < h (see also Fig. [11] for illustration). We create pointers from these nodes to an edge of the induced 
heavy tree, so Sh becomes a new heavy leaf. Let p(si) be the parent of si in the original suffix tree. We 
update the induced heavy tree as follows. 

(a) If p(si) is already a heavy branching node, then we create a new edge from p(si) to Sh in the induced 
heavy tree and make pointers from si,... ,Sh to this edge. 

(b) If p{si) is a heavy leaf, then we create pointers from si,... ,Sh to the in-coming edge of p(si) in the 
induced heavy tree. This “extends” the in-coming edge of the induced heavy tree to the new heavy 
leaf Sh- 

(c) If p{si) has just become a new heavy branching node because of the new heavy leaf Sh, then p{si) 
becomes a new internal node of the induced heavy tree. Let e be the original edge split by p{si). 
Note that we need to update the pointers to e. To do this efficiently, we use the “take the smaller” 
strategy: If at most half of the pointers from the suffix tree nodes to e are associated to the upper 
split part of e, then we redirect the pointers to the upper part of e to a newly created edge e' which 
is now the upper split part of e. We shorten e by making its starting point to p{si), which is now 
the lower split part. The cost of redirecting the pointers in the “smaller” part can be charged to 
the unredirected pointers which remain in e (the “larger” part), and hence the amortized cost for 
redirection per pointer is 0(1). The other case can be treated symmetrically. Finally, we create a 
new edge from p{si) to Sh in the induced heavy tree and make pointers from si,..., Sh to this new 
edge. 

It takes amortized O(logCT) time to update the NMA data structures and balanced search trees for 
lowest{■) for the split edge and the new edge in the heavy induced tree. We update the light trees as 
follows. By taking the nodes si,... ,Sh from the original light tree LT, a number of light trees are created. 
We reconstruct the suffix tree oracle for each of these light trees. This takes time linear in the size of each 
tree, and since the size of each tree is at most a, this takes 0{a) time. We can charge this cost to the a 
new leaves and reversed suffix links to be inserted to each light tree, which will make the root of this light 
tree a heavy one. Thus, the amortized cost for reconstructing the suffix tree oracle for the light trees is 


16 



0(1). Thus, it requires amortized 0(log(j) time to update our suffix tree oracle per insertion of a new leaf 
or a new suffix link. 

In the above description, we have assumed that the alphabet size a is known beforehand. If it is not 
the case, then we can reconstruct the suffix oracle each time the alphabet size doubles due to the growth 
of the texts in the collection. Since the number of distinct characters in the texts is at most the total 
length of the texts, the amortized cost of the reconstruction is 0(log(T). □ 

Remark 1. Consider an update operator {k,a) to the text collection. Recall that we want to insert the 
Type-2 sujfixes ofT^a into the suffix tree in increasing order of length. Let xa be either the shortest Type-3 
suffix ofTkO or any Type-2 suffix ofTkU. Let v be the lowest branching ancestor of x, and let bxa be the 
next Type-2 suffix of T^a to be inserted into the suffix tree, where 5 S E. Our suffix tree oracle described 
above covers the case where rslinkbiu) is defined for some ancestor u of v, but does not cover the other 
case where rslinkbffi) is undefined for any ancestor u ofv. However, in this case we can easily access the 
locus of bx in 0{loga) time: First, we move to the root of the suffix tree, and then take its out-going edge 
of which label begins with b. By assumption, there is no explicit node in the path between the root and the 
implicit node bx, and hence we can obtain the locus for bx with a simple arithmetic. 
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